Search CORE

27 research outputs found

A deep learning pipeline for product recognition on store shelves

Author: Di Stefano Luigi
Serra Eugenio
Tonioni Alessio
Publication venue
Publication date: 02/10/2018
Field of study

Recognition of grocery products in store shelves poses peculiar challenges. Firstly, the task mandates the recognition of an extremely high number of different items, in the order of several thousands for medium-small shops, with many of them featuring small inter and intra class variability. Then, available product databases usually include just one or a few studio-quality images per product (referred to herein as reference images), whilst at test time recognition is performed on pictures displaying a portion of a shelf containing several products and taken in the store by cheap cameras (referred to as query images). Moreover, as the items on sale in a store as well as their appearance change frequently over time, a practical recognition system should handle seamlessly new products/packages. Inspired by recent advances in object detection and image retrieval, we propose to leverage on state of the art object detectors based on deep learning to obtain an initial productagnostic item detection. Then, we pursue product recognition through a similarity search between global descriptors computed on reference and cropped query images. To maximize performance, we learn an ad-hoc global descriptor by a CNN trained on reference images based on an image embedding loss. Our system is computationally expensive at training time but can perform recognition rapidly and accurately at test time

arXiv.org e-Print Archive

Crossref

Scipedia

Real-time self-adaptive deep stereo

Author: Di Stefano Luigi
Mattoccia Stefano
Poggi Matteo
Tonioni Alessio
Tosi Fabio
Publication venue
Publication date: 01/01/2019
Field of study

Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs. These models, however, suffer from a notable decrease in accuracy when exposed to scenarios significantly different from the training set, e.g., real vs synthetic images, etc.). We argue that it is extremely unlikely to gather enough samples to achieve effective training/tuning in any target domain, thus making this setup impractical for many applications. Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment. However, this strategy is extremely computationally demanding and thus prevents real-time inference. We address this issue introducing a new lightweight, yet effective, deep stereo architecture, Modularly ADaptive Network (MADNet) and developing a Modular ADaptation (MAD) algorithm, which independently trains sub-portions of the network. By deploying MADNet together with MAD we introduce the first real-time self-adaptive deep stereo system enabling competitive performance on heterogeneous datasets.Comment: Accepted at CVPR2019 as oral presentation. Code Available https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stere

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Computer Vision and Deep Learning for retail store management

Author: Tonioni Alessio <1991>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 04/04/2019
Field of study

The management of a supermarket or retail store is a quite complex process that requires the coordinated execution of many different tasks (\eg, shelves management, inventory, surveillance, customer support\dots). Thanks to recent advancements of technology, many of those repetitive tasks can be completely or partially automated. One key technology requirement is the ability to understand a scene based only on information acquired by a camera, for this reason, we will focus on computer vision techniques to solve management problems inside a grocery retail store. We will address two main problems: (a) how to detect and recognize automatically products exposed on store shelves and (b) how to obtain a reliable 3D reconstruction of an environment using only information coming from a camera. We will tackle (a) both in a constrained version where the objective is to verify the compliance of observed items to a planned disposition, as well as an unconstrained one where no assumption on the observed scenes are considered. As for (b), a good solution represents one of the first crucial steps for the development and deployment of low-cost autonomous agents able to safely navigate inside the store either to carry out management jobs or to help customers (\eg, autonomous cart or shopping assistant). We believe that algorithms for depth prediction from stereo or mono camera are good candidates for the solution of this problem. The current state of the art algorithms, however, rely heavily on machine learning and can be hardly applied in the retail environment due to problems arising from the domain shift between data used to train them (usually synthetic images) and the deployment scenario (real indoor images). We will introduce techniques to adapt those algorithms to unseen environments without the need of costly ground truth data and in real time

AMS Tesi di Dottorato

LatentSwap3D: Semantic Edits on 3D Image GANs

Author: Simsar Enis
Tombari Federico
Tonioni Alessio
Örnek Evin Pınar
Publication venue
Publication date: 04/09/2023
Field of study

3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images. These models offer desirable features like high-quality geometry and multi-view consistency, but, unlike their 2D counterparts, complex semantic image editing tasks for 3D GANs have only been partially explored. To address this problem, we propose LatentSwap3D, a semantic edit approach based on latent space discovery that can be used with any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on identifying the latent code dimensions corresponding to specific attributes by feature ranking using a random forest classifier. It then performs the edit by swapping the selected dimensions of the image being edited with the ones from an automatically selected reference image. Compared to other latent space control-based edit methods, which were mainly designed for 2D GANs, our method on 3D GANs provides remarkably consistent semantic edits in a disentangled manner and outperforms others both qualitatively and quantitatively. We show results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF, and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).Comment: The paper has been accepted by ICCV'23 AI3DC

arXiv.org e-Print Archive